Autosplitter for big generated java methods #10367

Okapist · 2022-08-05T16:51:55Z

All unit tests ok with very small kMaxFieldsInMethod=4

Round trip benchmark

ByteArrayOutputStream baos = new ByteArrayOutputStream();
protoOld.writeTo(baos);

orig.protobuf_unittest.TestAnyProto.TestAny testAny =
        orig.protobuf_unittest.TestAnyProto.TestAny.parseFrom(baos.toByteArray());
blackhole.consume(testAny);

java -version
openjdk version "17.0.3" 2022-04-19
OpenJDK Runtime Environment Temurin-17.0.3+7 (build 17.0.3+7)
OpenJDK 64-Bit Server VM Temurin-17.0.3+7 (build 17.0.3+7, mixed mode, sharing)

Check small proto with 15 field. All fields inited

Benchmark      Mode  Cnt    Score   Error  Units
Test.checkNew  avgt    2  836,389          ns/op
Test.checkOld  avgt    2  933,154          ns/op

Check small proto with 15 field. 2 fields inited, 14 empty

Benchmark                    Mode  Cnt    Score   Error  Units
Test.checkNew15fields13empy  avgt    2  251,938          ns/op
Test.checkOld15fields13empy  avgt    2  334,046          ns/op

Check mid size proto. Here split code work and JIT work on old and on new code.

Benchmark              Mode  Cnt     Score   Error  Units
Test.checkNew70fields  avgt    2  1637,487          ns/op
Test.checkOld70fields  avgt    2  1621,035          ns/op

Check big proto. Around 900 fields. Here split code work and JIT not work on old code.

Benchmark                    Mode  Cnt      Score   Error  Units
TestVisit.checkNew800fields  avgt    2   6239,123          ns/op
TestVisit.checkOld800fields  avgt    2  36872,855          ns/op

I see no difference or slight increase speed in small or medium proto files and great speed increase in large proto files.

google-cla · 2022-08-05T16:51:59Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

fowles · 2022-08-05T17:17:05Z

We will need you to sign the CLA before we can review this change.

Signed-off-by: Okapist <anonbk@gmail.com>

Okapist · 2022-08-06T21:01:27Z

We will need you to sign the CLA before we can review this change.

CLA signed

fowles · 2022-08-07T00:02:18Z

Just a warning that the best person to review this is out for a bit, so you should expect silence for at least a week

googleberg · 2022-08-18T18:26:37Z

Back in the office. Reviewing.

…

On Sat, Aug 6, 2022 at 6:02 PM Matt Fowles Kulukundis < ***@***.***> wrote: Just a warning that the best person to review this is out for a bit, so you should expect silence for at least a week — Reply to this email directly, view it on GitHub <#10367 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AZRRDX4WYMMW63Q6YNMAJULVX34JLANCNFSM55WVVZTA> . You are receiving this because your review was requested.Message ID: ***@***.***>

-- Jerry Berg | Software Engineer | ***@***.*** | 720-808-1188

googleberg · 2022-08-18T19:01:40Z

@Okapist these are some intriguing results. Thank you for trying this out!

Before we go down this path of introducing complexity through splitting and continue to expand the size of generated message code, I'd like us to try comparing the performance using the Lite approach to these methods.

See GeneratedMessageLite.hashCode, .equals, .writeTo which use the Schema utility to implement these methods.

Would you be willing to take a crack at that for comparison? I have a pressing project that will likely delay me from investigating that approach myself. The goal is to balance performance with the maintenance load from having so many approaches to the problem.

Okapist · 2022-08-23T10:52:59Z

Yep.
No changes. It's generates the absolutely same java code for MessageLite. There no big methods generation in lite approach.

Performance check show no difference as expected.

Benchmark                     Mode  Cnt    Score   Error  Units
Test.checkNew70fieldsWriteTo  avgt    2  810,307          ns/op
Test.checkOld70fieldsWriteTo  avgt    2  835,257          ns/op

Benchmark                       Mode  Cnt     Score   Error  Units
Test.checkNew70fieldsRoundtrip  avgt    2  1842,443          ns/op
Test.checkOld70fieldsRoundtrip  avgt    2  1968,669          ns/op

Benchmark                    Mode  Cnt  Score   Error  Units
Test.checkNew70fieldsEquals  avgt    2  0,822          ns/op
Test.checkOld70fieldsEquals  avgt    2  0,889          ns/op

googleberg · 2022-08-23T16:36:52Z

@Okapist Thanks for running this!

If I understand, these are results for running the other methods with the split in place. Which will be useful for the comparison I really want to run which is using the Lite implementation of these methods vs current impl and split impl.

We could then decide if we should introduce splitting or use the Lite implementation.

Okapist · 2022-08-29T14:59:54Z

Lite implementation is slow. Usually we use serialize and deserialize only. Lite is 20-30% slower than normal.
Use normal implementation for all cases except big proto is frustrating and hard to support.

googleberg

Before we proceed with this implementation, what is the reason for incrementally deciding to split vs just looking at the number of fields up front and branching within the top-level method?

googleberg · 2022-10-06T17:27:45Z

src/google/protobuf/compiler/java/helpers.cc

@@ -1120,6 +1122,22 @@ void EscapeUtf16ToString(uint16_t code, std::string* output) {
  }
 }

+void MaybeSplitJavaMethod(io::Printer *printer, int* fields_in_function, int* method_num,


Why do we MaybeSplitJavaMethod? Can't we know ahead of time (by the number of fields) how we need to split the top-level method?

ljw-hit · 2023-12-06T03:31:27Z

@Okapist @googleberg Hi, I would like to know the current status of this PR. Has the ability of autosplitter been integrated? I think this feature is still very important. Big proto will seriously affect the performance of serialization and deserialization in Java. Even if you encounter a proto with too many lines and the generated java method exceeds 64k, it cannot be used at all.

Okapist changed the title ~~Protocolbuffers main~~ Autosplitter for big generated methods Aug 5, 2022

Okapist changed the title ~~Autosplitter for big generated methods~~ Autosplitter for big generated java methods Aug 5, 2022

fowles requested a review from googleberg August 5, 2022 17:16

Okapist force-pushed the protocolbuffers-main branch from 57c342b to cbb0e87 Compare August 5, 2022 17:17

fowles added java release notes: yes labels Aug 5, 2022

Okapist force-pushed the protocolbuffers-main branch 2 times, most recently from 133075f to 04739e5 Compare August 5, 2022 18:10

Splitter for big java methods

ef88e38

Signed-off-by: Okapist <anonbk@gmail.com>

Okapist force-pushed the protocolbuffers-main branch from 04739e5 to ef88e38 Compare August 5, 2022 18:15

Merge branch 'main' into protocolbuffers-main

ebb4488

googleberg reviewed Oct 6, 2022

View reviewed changes

googleberg self-assigned this Oct 13, 2022

deannagarcia added the wait for user action label Jan 5, 2023

haberman closed this Feb 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autosplitter for big generated java methods #10367

Autosplitter for big generated java methods #10367

Okapist commented Aug 5, 2022

google-cla bot commented Aug 5, 2022

fowles commented Aug 5, 2022

Okapist commented Aug 6, 2022

fowles commented Aug 7, 2022

googleberg commented Aug 18, 2022 via email

googleberg commented Aug 18, 2022

Okapist commented Aug 23, 2022 •

edited

Loading

googleberg commented Aug 23, 2022

Okapist commented Aug 29, 2022

googleberg left a comment

googleberg Oct 6, 2022

ljw-hit commented Dec 6, 2023

Autosplitter for big generated java methods #10367

Autosplitter for big generated java methods #10367

Conversation

Okapist commented Aug 5, 2022

google-cla bot commented Aug 5, 2022

fowles commented Aug 5, 2022

Okapist commented Aug 6, 2022

fowles commented Aug 7, 2022

googleberg commented Aug 18, 2022 via email

googleberg commented Aug 18, 2022

Okapist commented Aug 23, 2022 • edited Loading

googleberg commented Aug 23, 2022

Okapist commented Aug 29, 2022

googleberg left a comment

Choose a reason for hiding this comment

googleberg Oct 6, 2022

Choose a reason for hiding this comment

ljw-hit commented Dec 6, 2023

Okapist commented Aug 23, 2022 •

edited

Loading